{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "authorship_tag": "ABX9TyNo7/eDIuUXulqI4aKGLXrI",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/sugarforever/LangChain-Tutorials/blob/main/OpenAI_Chat_Completions_16k.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# OpenAI new chat model gpt-3.5-turbo-16k\n",
        "\n",
        "gpt-3.5-turbo-16k offers 4 times the context length of gpt-3.5-turbo at twice the price: $0.003/1K input tokens and $0.004/1K output tokens.\n",
        "\n",
        "16k context means the model can now support ~20 pages of text in a single request."
      ],
      "metadata": {
        "id": "BW1vmhBS4fKN"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!pip install tiktoken langchain openai --quiet --upgrade"
      ],
      "metadata": {
        "id": "EbRu-u0F4SaZ"
      },
      "execution_count": 31,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "import os\n",
        "import openai\n",
        "\n",
        "openai.api_key = \"Your OpenAI API Key\""
      ],
      "metadata": {
        "id": "QuGN31Hi4Ghh"
      },
      "execution_count": 32,
      "outputs": []
    },
    {
      "cell_type": "code",
      "execution_count": 33,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "_S3w-_dYLLyh",
        "outputId": "59bef309-1f5f-440c-ff19-5bbdee442aee"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "--2023-06-15 21:20:37--  https://uniswap.org/whitepaper-v3.pdf\n",
            "Resolving uniswap.org (uniswap.org)... 104.18.23.54, 104.18.22.54, 2606:4700::6812:1736, ...\n",
            "Connecting to uniswap.org (uniswap.org)|104.18.23.54|:443... connected.\n",
            "HTTP request sent, awaiting response... 200 OK\n",
            "Length: 1500865 (1.4M) [application/pdf]\n",
            "Saving to: ‘whitepaper-v3.pdf.1’\n",
            "\n",
            "\rwhitepaper-v3.pdf.1   0%[                    ]       0  --.-KB/s               \rwhitepaper-v3.pdf.1 100%[===================>]   1.43M  --.-KB/s    in 0.02s   \n",
            "\n",
            "2023-06-15 21:20:37 (65.6 MB/s) - ‘whitepaper-v3.pdf.1’ saved [1500865/1500865]\n",
            "\n"
          ]
        }
      ],
      "source": [
        "!wget https://uniswap.org/whitepaper-v3.pdf"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "!ls -alt whitepaper-v3.pdf"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "9FhyJVTwLVsu",
        "outputId": "48e7c8d6-bbfd-4d09-e3fd-60f0a4dd88de"
      },
      "execution_count": 34,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "-rw-r--r-- 1 root root 1500865 Jun 15 20:28 whitepaper-v3.pdf\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "import tiktoken\n",
        "\n",
        "encoding = tiktoken.encoding_for_model('gpt-3.5-turbo-16k')\n",
        "encoding"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "SWdGZYFJs9rL",
        "outputId": "9b50cf76-7bfe-4005-9272-3d9856e37137"
      },
      "execution_count": 35,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "<Encoding 'cl100k_base'>"
            ]
          },
          "metadata": {},
          "execution_count": 35
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "encoding.encode(\"tiktoken is great!\")"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "_4R-IPTdtTep",
        "outputId": "a37a348d-b911-4797-dec4-032002ed8043"
      },
      "execution_count": 36,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "[83, 1609, 5963, 374, 2294, 0]"
            ]
          },
          "metadata": {},
          "execution_count": 36
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "!pip install PyPDF2 pymupdf --quiet"
      ],
      "metadata": {
        "id": "2F2s9bLpuXXK"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "from langchain.document_loaders import PyMuPDFLoader\n",
        "\n",
        "pages = PyMuPDFLoader('whitepaper-v3.pdf').load()\n",
        "\n",
        "all_content = ''\n",
        "for page in pages:\n",
        "  all_content += page.page_content + \"\\n\""
      ],
      "metadata": {
        "id": "d7QDq7P1ubQw"
      },
      "execution_count": 37,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "print(len(all_content))"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "5SeDEippv4wM",
        "outputId": "ccb48270-6280-40ed-8a86-8e876c42a036"
      },
      "execution_count": 38,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "41711\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "tokens = encoding.encode(all_content)\n",
        "len(tokens)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "jnXy3800wU4d",
        "outputId": "f3601eb2-a3ee-47f9-e9bb-b8306e634ae3"
      },
      "execution_count": 39,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "12377"
            ]
          },
          "metadata": {},
          "execution_count": 39
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "completion = openai.ChatCompletion.create(\n",
        "  model=\"gpt-3.5-turbo-16k\",\n",
        "  messages = [\n",
        "    {\"role\": \"system\", \"content\" : \"You are a chatbot which can answer questions based on the paper user supplies.\"},\n",
        "    {\"role\": \"user\", \"content\" : \"I have the following whitepaper of uniswap v3\"},\n",
        "    {\"role\": \"user\", \"content\" : all_content},\n",
        "    {\"role\": \"user\", \"content\" : \"What is liquidity oracle?\"}\n",
        "  ]\n",
        ")\n",
        "print(completion)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "Kk2kQK2Gwag5",
        "outputId": "dcc5d468-e2b7-4623-ef91-02c9ce4bc2ce"
      },
      "execution_count": 44,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "{\n",
            "  \"id\": \"chatcmpl-7Rope0Wps0WHHCnQDAw1ZIZzQbWtM\",\n",
            "  \"object\": \"chat.completion\",\n",
            "  \"created\": 1686864610,\n",
            "  \"model\": \"gpt-3.5-turbo-16k-0613\",\n",
            "  \"choices\": [\n",
            "    {\n",
            "      \"index\": 0,\n",
            "      \"message\": {\n",
            "        \"role\": \"assistant\",\n",
            "        \"content\": \"A liquidity oracle is a mechanism that provides information about the available liquidity in a particular market or trading pair. In the context of decentralized exchanges (DEXs) like Uniswap, a liquidity oracle provides data on the depth and volume of liquidity available for different token pairs.\\n\\nThe liquidity oracle collects and aggregates data from various liquidity sources, such as liquidity providers and market makers, to calculate the overall liquidity in a specific trading pair. It helps traders and users of DEXs to make informed decisions by providing insights into the liquidity landscape.\\n\\nThe data provided by a liquidity oracle can be used in various ways, including calculating slippage for trades, determining price impact, and assessing the overall health and stability of a market. Liquidity oracles play a crucial role in ensuring that users have access to reliable liquidity information, which is essential for the efficient functioning of decentralized exchanges.\"\n",
            "      },\n",
            "      \"finish_reason\": \"stop\"\n",
            "    }\n",
            "  ],\n",
            "  \"usage\": {\n",
            "    \"prompt_tokens\": 12429,\n",
            "    \"completion_tokens\": 173,\n",
            "    \"total_tokens\": 12602\n",
            "  }\n",
            "}\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "completion['choices'][0]['message']['content']"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 109
        },
        "id": "yHNo3vqpxwqw",
        "outputId": "1ec59f1b-5f37-4ce6-9461-0beb58350b0f"
      },
      "execution_count": 42,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "'A liquidity oracle is a mechanism or system that provides information about the liquidity available in a specific market or trading pair. It helps users and smart contracts determine the depth and availability of liquidity for trading or other purposes.\\n\\nIn the context of Uniswap v3, the liquidity oracle is a time-weighted average liquidity oracle. It provides information about the average liquidity for a given pair of tokens over a specific period of time. It allows users to query the recent liquidity accumulator values, eliminating the need to checkpoint the accumulator value at the exact beginning and end of the period for which a time-weighted average liquidity is being measured.\\n\\nThe liquidity oracle provides valuable information for various purposes, such as optimizing trading strategies, determining slippage, or making decisions related to liquidity provisioning and mining programs.'"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
            }
          },
          "metadata": {},
          "execution_count": 42
        }
      ]
    }
  ]
}